Feature string-based intelligent information retrieval from Tamil document images
Identifieur interne : 000A82 ( Main/Exploration ); précédent : 000A81; suivant : 000A83Feature string-based intelligent information retrieval from Tamil document images
Auteurs : S. Abirami [Inde] ; D. Manjula [Inde]Source :
- International journal of computer applications in technology [ 0952-8091 ] ; 2009.
Descripteurs français
- Pascal (Inist)
- Chaîne caractère, Recherche information, Recherche documentaire, Langage naturel, Texte, Reconnaissance image, Image optique, Reconnaissance optique caractère, Reconnaissance caractère, Analyse image, Traitement image, Bibliothèque électronique, Lettre alphabet, Procédé extraction, Mot clé, Extraction forme.
- Wicri :
- topic : Recherche documentaire.
English descriptors
- KwdEn :
Abstract
Information Retrieval (IR) in document images has become a growing and challenging problem due to its rising popularity. This paper proposes a simple and effective method to extract the text and perform intelligent IR from Tamil Document Images without Optical Character Recognition (OCR). This methodology generates a feature string for every word image by extracting its features. This relies on their basic characteristics or shapes of letters instead of recognising the letters like OCR. The strength of this technique lies in extracting the text based on their basic features such as lines and black and white disposition rates in characters which is almost same for the characters across various font sizes and font faces. As an offline process, document images are preprocessed and text extraction process extracts the features from the word images based on their shapes and they are stored in temporary files. During online retrieval, textual keyword is obtained from the user and its primitive string is framed. Based on the primitive string, IR is performed and the resultant images are provided to the user. This technique could be easily adopted in large digital libraries for IR.
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream PascalFrancis, to step Corpus: 000192
- to stream PascalFrancis, to step Curation: 000585
- to stream PascalFrancis, to step Checkpoint: 000198
- to stream Main, to step Merge: 000A92
- to stream Main, to step Curation: 000A82
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">Feature string-based intelligent information retrieval from Tamil document images</title>
<author><name sortKey="Abirami, S" sort="Abirami, S" uniqKey="Abirami S" first="S." last="Abirami">S. Abirami</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Computer Science & Engineering, College of Engineering, Anna University</s1>
<s2>Chennai 600025</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Chennai 600025</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Manjula, D" sort="Manjula, D" uniqKey="Manjula D" first="D." last="Manjula">D. Manjula</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Computer Science & Engineering, College of Engineering, Anna University</s1>
<s2>Chennai 600025</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Chennai 600025</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">10-0181228</idno>
<date when="2009">2009</date>
<idno type="stanalyst">PASCAL 10-0181228 INIST</idno>
<idno type="RBID">Pascal:10-0181228</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000192</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000585</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000198</idno>
<idno type="wicri:doubleKey">0952-8091:2009:Abirami S:feature:string:based</idno>
<idno type="wicri:Area/Main/Merge">000A92</idno>
<idno type="wicri:Area/Main/Curation">000A82</idno>
<idno type="wicri:Area/Main/Exploration">000A82</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">Feature string-based intelligent information retrieval from Tamil document images</title>
<author><name sortKey="Abirami, S" sort="Abirami, S" uniqKey="Abirami S" first="S." last="Abirami">S. Abirami</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Computer Science & Engineering, College of Engineering, Anna University</s1>
<s2>Chennai 600025</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Chennai 600025</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Manjula, D" sort="Manjula, D" uniqKey="Manjula D" first="D." last="Manjula">D. Manjula</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Computer Science & Engineering, College of Engineering, Anna University</s1>
<s2>Chennai 600025</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Chennai 600025</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">International journal of computer applications in technology</title>
<title level="j" type="abbreviated">Int. j. comput. appl. technol.</title>
<idno type="ISSN">0952-8091</idno>
<imprint><date when="2009">2009</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">International journal of computer applications in technology</title>
<title level="j" type="abbreviated">Int. j. comput. appl. technol.</title>
<idno type="ISSN">0952-8091</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Character recognition</term>
<term>Character string</term>
<term>Document retrieval</term>
<term>Electronic library</term>
<term>Extraction process</term>
<term>Image analysis</term>
<term>Image processing</term>
<term>Image recognition</term>
<term>Information retrieval</term>
<term>Keyword</term>
<term>Letter</term>
<term>Natural language</term>
<term>Optical character recognition</term>
<term>Optical image</term>
<term>Pattern extraction</term>
<term>Text</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Chaîne caractère</term>
<term>Recherche information</term>
<term>Recherche documentaire</term>
<term>Langage naturel</term>
<term>Texte</term>
<term>Reconnaissance image</term>
<term>Image optique</term>
<term>Reconnaissance optique caractère</term>
<term>Reconnaissance caractère</term>
<term>Analyse image</term>
<term>Traitement image</term>
<term>Bibliothèque électronique</term>
<term>Lettre alphabet</term>
<term>Procédé extraction</term>
<term>Mot clé</term>
<term>Extraction forme</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr"><term>Recherche documentaire</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Information Retrieval (IR) in document images has become a growing and challenging problem due to its rising popularity. This paper proposes a simple and effective method to extract the text and perform intelligent IR from Tamil Document Images without Optical Character Recognition (OCR). This methodology generates a feature string for every word image by extracting its features. This relies on their basic characteristics or shapes of letters instead of recognising the letters like OCR. The strength of this technique lies in extracting the text based on their basic features such as lines and black and white disposition rates in characters which is almost same for the characters across various font sizes and font faces. As an offline process, document images are preprocessed and text extraction process extracts the features from the word images based on their shapes and they are stored in temporary files. During online retrieval, textual keyword is obtained from the user and its primitive string is framed. Based on the primitive string, IR is performed and the resultant images are provided to the user. This technique could be easily adopted in large digital libraries for IR.</div>
</front>
</TEI>
<affiliations><list><country><li>Inde</li>
</country>
</list>
<tree><country name="Inde"><noRegion><name sortKey="Abirami, S" sort="Abirami, S" uniqKey="Abirami S" first="S." last="Abirami">S. Abirami</name>
</noRegion>
<name sortKey="Manjula, D" sort="Manjula, D" uniqKey="Manjula D" first="D." last="Manjula">D. Manjula</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000A82 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000A82 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= Pascal:10-0181228 |texte= Feature string-based intelligent information retrieval from Tamil document images }}
This area was generated with Dilib version V0.6.32. |